Metadata Training #611

FriederikeHanssen · 2025-05-16T11:27:51Z

Closes #520

This PR adds a training for meta maps. There is a solid overlap with working-with-files training for things like parsing samplesheets, getting values from it, and filtering. As we come up with new courses we need to decide which of them comes first and can be used as a basis.
Using maps a lot
a bit on the fence of using ternary operator here. Maybe it is too much, on the other hand it is absolutely everywhere so we have to include it at some point
Sorting the side quests alphabetically and adding in a missing one to the overview page

netlify · 2025-05-16T11:27:57Z

✅ Deploy Preview for nextflow-training ready!

Name	Link
🔨 Latest commit	`54c9d4c`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-training/deploys/682e0757693fe40008bc6010
😎 Deploy Preview	https://deploy-preview-611--nextflow-training.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

adamrtalbot

There's some heavy overlap with splitting and joining, I might be tempted to reduce it and stick with using meta values.

Simplify the ternary stuff, I think it takes attention away from metamaps.

docs/side_quests/metadata.md

pinin4fjords

Nice work great simple subject to teach with! In general:

Having the 'after' before the 'before' in examples makes me go a bit cross-eyed
You don't need to include such large sections of text in the 'before', just enough to anchor the reader in the text. Copy whole blocks repeatedly is hard to parse, and for us to maintain!
It definitely is 'romance' rather than 'romanic'.

docs/side_quests/metadata.md

pinin4fjords · 2025-05-22T07:50:01Z

docs/side_quests/metadata.md

+
+### 1.2 Separate meta data and data
+
+In the samplesheet, we have both the input files and data about the input files (`id`, `character`), the meta data. As we progress through the workflow, we generate more meta data about each sample. To avoid having to keep track of how many fields we have at any point in time and making the input of our processes more robust, we can combine the meta information into its own key-value paired map.


Suggested change

In the samplesheet, we have both the input files and data about the input files (`id`, `character`), the meta data. As we progress through the workflow, we generate more meta data about each sample. To avoid having to keep track of how many fields we have at any point in time and making the input of our processes more robust, we can combine the meta information into its own key-value paired map.

In the samplesheet, we have both data (input files) and associated metadata (`id`, `character`). We'll be referencing and adding to the metadata, but it will be the data files themselves that form the core of our processing. As such, it's important to separate data and metadata early on.

Just my take on a revised motivation, disregard at will.

docs/side_quests/metadata.md

FriederikeHanssen · 2025-07-16T18:31:13Z

Having the 'after' before the 'before' in examples makes me go a bit cross-eyed

I kinda agree, although I think in the teaching it works, because people don't need to click to get the "new" code. We are doing it in other places in the training though, but are also inconsistent in how we format it. Maybe we need an overarching push for consistency here.

@pinin4fjords

* Add working with files side quest Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training. Story: - Create `file` object from string - Look at `file` object attributes - Extract sample metadata from filename - Use `Channel.fromPath` to create a channel of files - Extract sample metadata from filename within map operator - Use `Channel.fromFilePairs` to create a channel of file pairs - Use `publishDir` to save results Should help introduce the concept of handling files better. Problems: - Doesn't ram home the "always use files as inputs!!!" message. We could do that in the final section on processes? * Metadata Training (#611) * rough plan for metadata training * solution for training * update solution * first half of the text * add intro and second half of training * add to side bar * linting fixes * linting fixes * add sumary and key concepts * linting. again. * fix line highlights * fix line highlights, formatting * fix indents * flesh out some of the explanations * fix text * Add working with files side quest (#601) * Add working with files side quest Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training. Story: - Create `file` object from string - Look at `file` object attributes - Extract sample metadata from filename - Use `Channel.fromPath` to create a channel of files - Extract sample metadata from filename within map operator - Use `Channel.fromFilePairs` to create a channel of file pairs - Use `publishDir` to save results Should help introduce the concept of handling files better. Problems: - Doesn't ram home the "always use files as inputs!!!" message. We could do that in the final section on processes? * fixups and make paths relative to codespaces * Add real FASTQ data * Reduce code blocks for clarity * Typo in splitting and grouping path * Remove splitting and grouping (wrong branch) * Clarify the difference between strings and files in Nextflow * Clarify files in Nextflow * Correct summary sentence to reflect code, it was copy+pasted wrong * Update docs/side_quests/working_with_files.md Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io> * Clarify why we want to flatten tuples * Simplify the map operation part * Remove reference to Groovy methods * Fix indentation * Conver data to metamap to introduce metamaps early * Refine summary points * Use markdown numbering * Add link to file properties documentation * Explain multiple assignment better * Update docs/side_quests/working_with_files.md Co-authored-by: Maxime U Garcia <maxime.garcia@seqera.io> * Add Channel.fromFilePairs docs as link to fromFilePairs section * Added glob explanation as a note * Added glob explanation as a note * Use bullet points instead of numbers for key concepts of working with files * working with files use before-after syntax correctly * working with files add highlighting * working with files highlighting fixup * Fixups: Respond to review comments --------- Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io> Co-authored-by: Maxime U Garcia <maxime.garcia@seqera.io> * Intermediate training Reorganisation (Meta maps, working with files, splitting & grouping) (#636) * Bump NXF_VER to latest stable (25.04.3) (#618) * Bump NXF_VER to latest stable (25.04.3) * Set devcontainer image from base:ubuntu to base:dev-ubuntu-24.04 --------- Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> * Hello Nextflow: Transcripts (#615) * Add transcripts for Hello Nextflow videos * Check headings: ignore transcript files --------- Co-authored-by: Geraldine Van der Auwera <geraldinevdauwera@gmail.com> * Small fixes to Hello Nextflow outputs (#620) * Update outputs * Update script to match training content --------- Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> * Minor link fixes (#621) * Minor link fixes * Delete nf-develop mention * Update French translation for the Home and Help pages (#622) * Fix buildx error when image contains attestation manifest (#623) Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> * Address #491 - Selecting a Nextflow version and Environment variables (#627) * Address #491 - Selecting a Nextflow version and Environment variables * removing Environment section * Prettify orientation file --------- Co-authored-by: Marcel Ribeiro-Dantas <marcel.ribeirodantas@seqera.io> * expand intro to go from samplespecific data to the term meta data * rephrase intro * fix wording to romance * remove ternary operator to focus on meta maps, instead highlight merging of maps * remove ternary operator from learnings * move map explanation up * First pass of the Nextflow Run course (#626) Abridged version of Hello Nextflow focused on running rather than developing. - Run basic operations (Hello World level) - Run pipelines (channels for multiple inputs, multi-step example, modules, containers) - Configuration * remove filtering from the metamap training. * remove join, filter, fix language, line numbers * fix numbering * add samplesheet parsing into starter file * update solution with simplified training * abbreviate title * add trailing dot to make the linter happy * very minor refactoring, simplifying the quests to how to get to a meta data map * fix linting * closing message * ad solution * use meta maps * Update docs/side_quests/working_with_files.md Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> * remove duplication from sidebar * add working with files to side quest overview * remove more duplications * remove section 5.2/5.3, add solution * linting * prettier linting * simplify if/else more * formatting * linting * add meta data training as prereq * Adding posthog (#640) * Adding GTM * Add custom GTM integration for MkDocs - Create custom GTM analytics partial to avoid posthog-js dependency issues - Maintain GTM integration while using Material for MkDocs supported approach - Remove empty package-lock.json that was causing build confusion * Updating just posthog * Updating just posthog * Fix linting error --------- Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> * Add a proper intro to IDE features (#632) * Add ide intro * Correct heading number * Fix numbering * Add a bunch of images * Prettier * Add to index * Crop some images for renamed file * Minor fixups including: - Clarifying ctrl/cmd key usage - Fixing some language - Removing IDE from title * Some more fixups and adjustments based on a run through * fixup numbering * Couple more feedback tweaks * Resove extensions panel opening via icon * typo * Move Cmd admonition, add files icon, tidy up Cntrl/ Cmd * Update syntax showcase image * Fix operators autocomplete image * Fix task autocomplete * Fix config autocomplete * correct link following * Better resolve linking / process inspecting * Misc fixes * Fix script * Text improvement * More minor text fixes * Add source control screenshot * Strip AI coverage in IDE section due to plugin bug * Clarify terminal shortcut * Add note on structure * Fix numbering * Update docs/side_quests/ide_features.md --------- Co-authored-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com> Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io> --------- Co-authored-by: Geraldine Van der Auwera <geraldinevdauwera@gmail.com> Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> Co-authored-by: Phil Ewels <phil.ewels@seqera.io> Co-authored-by: Marcel Ribeiro-Dantas <mribeirodantas@seqera.io> Co-authored-by: Kristina Gagalova <kristina.gagalova@curtin.edu.au> Co-authored-by: Marcel Ribeiro-Dantas <marcel.ribeirodantas@seqera.io> Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> Co-authored-by: mavi-sqr <marta.vidal@seqera.io> Co-authored-by: Jonathan Manning <jonathan.manning@seqera.io> * docs(side_quests): add remote file usage section to working with files guide - Add comprehensive documentation on using remote files with URIs - Include examples for HTTP, S3, Azure, and Google Cloud Storage protocols - Demonstrate seamless switching between local and remote data - Update summary and key concepts sections with remote file examples - Show how Nextflow automatically handles file staging and caching * refactor(docs): clean up side quests navigation and remove duplicate files - Remove duplicate navigation entries for nf-test.md and nf-core.md - Delete outdated splitting-and-grouping.md file - Reorder side quests navigation for better organization - Fix navigation structure to eliminate redundancy * Apply suggestions from code review Co-authored-by: Jonathan Manning <jonathan.manning@seqera.io> * replace sample with file or data in the main text, use consistent spelling for datasheet and metadata * fix indentation & format solution. Add fix for changed if clause to adhere to language server * add in problem statement * adding @pinin4fjords suggestions on emphasis * add map example * replace attributes/properties with metadata * add publishDir right away * add in reference to splitting and grouping suggested by @pinin4fjords * tweak wording suggested by @pinin4fjords * Update docs/side_quests/metadata.md Co-authored-by: Jonathan Manning <jonathan.manning@seqera.io> * add a follow on sentence on the balance between meta maps and explicit inputs * fix linting * sample -> patient * Promote remote files coverage * Add object type coverage * Do better with strings vs Path objects * link out to path docs * Add example to illustrate file handling in a process * Formatting fix * Update new example * Formatting fixes * Formatting fixes * update title * Correct highlights * Remove duplicate content * Explain debug * fix path * More formatting fixes * More formatting fixes * Cover string class * main.nf -> file_operations.nf * main.nf -> file_operations.nf * Fix link * Continuity fixes * Apply minor suggestions from code review * Link to 'working with metadata' * Fix title * Fix takeaways * Remove lingering 'sample' references * Update learning outcomes * First batch of edits from final review * Minor fixes, and we need pipefail * fix nextflow versions * Remove messy bolds * Declare files inside workflow block * attempt indent fixes * More fixes * Files fixes including summary * Prettier * Initial cleanup of splitting/ grouping * syntax fixes * Convert to file object and link out * Fix csv name and nextflow versions * Intro tweaks * misc fixes * Add map explanation * More SnG tweaks * More SnG tweaks * More SnG tweaks * More SnG tweaks * Fix line nums * Fix line nums * tumour -> tumor * Fix line nums * More SnG tweaks * Misc fixes * Misc fixes * More fixes * Fix up takeaways * AI-assisted prose cleanup * Fix up code block titles * Tweak summary * Smooth transitions * Update solution * Prettier --------- Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io> Co-authored-by: Maxime U Garcia <maxime.garcia@seqera.io> Co-authored-by: Geraldine Van der Auwera <geraldinevdauwera@gmail.com> Co-authored-by: Marcel Ribeiro-Dantas <marcel@seqera.io> Co-authored-by: Phil Ewels <phil.ewels@seqera.io> Co-authored-by: Marcel Ribeiro-Dantas <mribeirodantas@seqera.io> Co-authored-by: Kristina Gagalova <kristina.gagalova@curtin.edu.au> Co-authored-by: Marcel Ribeiro-Dantas <marcel.ribeirodantas@seqera.io> Co-authored-by: mavi-sqr <marta.vidal@seqera.io> Co-authored-by: Jonathan Manning <jonathan.manning@seqera.io>

rough plan for metadata training

51db901

FriederikeHanssen added 14 commits May 20, 2025 16:26

solution for training

8fcc998

update solution

bdcb319

first half of the text

14ae39d

add intro and second half of training

3296c1c

add to side bar

d9204a0

linting fixes

36507b0

linting fixes

23d5e1f

add sumary and key concepts

3299b51

linting. again.

9f15d58

fix line highlights

e24e2e5

fix line highlights, formatting

0d53fd6

fix indents

909e36c

flesh out some of the explanations

b50a976

fix text

54c9d4c

FriederikeHanssen marked this pull request as ready for review May 21, 2025 17:07

FriederikeHanssen requested review from adamrtalbot and vdauwera and removed request for adamrtalbot May 21, 2025 17:07

adamrtalbot reviewed May 21, 2025

View reviewed changes

pinin4fjords reviewed May 22, 2025

View reviewed changes

vdauwera added the side quests label Jun 3, 2025

FriederikeHanssen changed the base branch from master to intermediate_training June 5, 2025 11:06

FriederikeHanssen merged commit df53a2c into nextflow-io:intermediate_training Jul 11, 2025
5 checks passed

FriederikeHanssen mentioned this pull request Jul 25, 2025

Intermediate training Reorganisation (Meta maps, working with files, splitting & grouping) #636

Merged

FriederikeHanssen mentioned this pull request Aug 5, 2025

Intermediate training draft #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata Training #611

Metadata Training #611

Uh oh!

FriederikeHanssen commented May 16, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 16, 2025 •

edited

Loading

Uh oh!

adamrtalbot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FriederikeHanssen commented Jul 16, 2025

Uh oh!

Uh oh!


		### 1.2 Separate meta data and data

		In the samplesheet, we have both the input files and data about the input files (`id`, `character`), the meta data. As we progress through the workflow, we generate more meta data about each sample. To avoid having to keep track of how many fields we have at any point in time and making the input of our processes more robust, we can combine the meta information into its own key-value paired map.

	In the samplesheet, we have both the input files and data about the input files (`id`, `character`), the meta data. As we progress through the workflow, we generate more meta data about each sample. To avoid having to keep track of how many fields we have at any point in time and making the input of our processes more robust, we can combine the meta information into its own key-value paired map.
	In the samplesheet, we have both data (input files) and associated metadata (`id`, `character`). We'll be referencing and adding to the metadata, but it will be the data files themselves that form the core of our processing. As such, it's important to separate data and metadata early on.

Metadata Training #611

Metadata Training #611

Uh oh!

Conversation

FriederikeHanssen commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-training ready!

Uh oh!

adamrtalbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FriederikeHanssen commented Jul 16, 2025

Uh oh!

Uh oh!

FriederikeHanssen commented May 16, 2025 •

edited

Loading

netlify bot commented May 16, 2025 •

edited

Loading